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ABSTRACT 

This paper discusses the problem of context-free position estimation 
using a stereo vision system with moveable eyes. Exact and approximate 
equations are developed linking position to measureable quantities of 
the image-space* and an algorithm for finding these quantities is 
suggested in rough form. An estimate of errors and resolution limits 
is provided. 
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T» Introduction 

^h* Intent cf this study la two-fold: 

1p to determine quantitatively the nature and flirjount 
of additional information presented by a stereo 
( as opnoaed to monoscoptc ) visual apparatus) 
11* to Investigate qualitatively some useful ways 
of incorporating this additional information 
in an artificial visual scene analyser. 
It nay be noted at once that the only real distinction 
between stereoscopic and monocular vision is that the latter 
presents a single visual image of a scene f while the former 
provides us with two images* This distinction becomes quickly 
meaningless however * unless a practical method exists of 
comparing the two Images* and determining the differences 
between them, *"or this reason* I am forced right at the 
beginning to address the question of -difference-measuring" 
between visual images* and to state explicitly the assumptions 
I have made concerning it. 

My first assumption Is that at some level of even current 
vision programs, the ima#e seen by a single eye is represented 
as a 2-D matrix of measured light intensity values P or could 
be so represented without much difficulty. 

My second assumption is that if a stereo eye system were 
to be used, it would be mechanically* constrained so that 
the "center points ■ of the 3-D image matrices were never 

* a variety of feedback control systems, or even digital 
control systems can be imagined which might do this, and 
yet allow the constraint to be removed if desired. 



representative of different points In 3-space* i.e., that 
a "point-of-trigonome trie- focus ■ existed, towards which "both 
eyes always "pointed". This focal point could freely shift 
in distance away from the eyes, or closer, hut the forward 
axes of the eyes could not "become significantly skew relative 
to the limits of angular resolution. The eyes would be 
capable of only single-degree -of- freedom motion with 
respect to each other, about their vertical axes. 

My third assumption, which is difficult to Justify Just 
yat* is that If an element in one eye's image matrix *ere 
selected , its counterpart in the other image could be found 
from local evidence such that both represented the same point 
in 3-space. This is rather a difficult exercise in pattern 
Batching in the general case, particularly since 1 under- 
stand that high noise levels are present in the visual Images, 
but T will offer some results in Part III that can help 
quite a bit in limiting the search, 1*11 come back to 
this problem later i for now 1*11 Just assume it is solveable* 1 

With these assumptions , we proceed to some mathematics 
relevant to the (continuous) real-world situation* 

* Lerman (1) has in fact presented results which demonstrate 
that this type of pattern matching can be accomplished when 
applied to images generated by eyes focused on inflnity f the 
only case he considered. See Part III. 



II » Definitions, Co-ordinate Systems, and Consequences 

Let us consider a fixed orthogonal reference co-ordinate 
system S ( the 'table* systea ), defined so that I is generally 
■up 1 * "J la generally 'right 1 , and 1c la generally 'away*. 
Presume that In this system* the point midway between the 
eyes Is located out In the general -T? direction at ? s , and 
that the eyes are focused ( in the trigonometric sense ) on f 3 * 
A group of objects to be viewed lies near the origin, and 
both P fl and F g are known. 

The S system all by itself is adequate for representing 
the location of points in apace, but it will be useful here 
to define a few more for clarity* One alternative is the 
system J { see Fig. I )# whose origin lies at P a , and whose 
orientation is such that 

To lies along To * Eo (almost *up*)f 

"Jo H ftS along It© * Ts (horizontal i almost 'right*)* 

T£q lies along Fg - Fg (towards Fa). 



Trana format Ion between these systems Is easily made through 
the relation 



* S = ^ ** + P* 
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It trill be assumed that our eyes in the J c system lie 
at ±D. where D - d( 0.1.0 ), and it will be noted that f a 
in this system appears to lies at P « f ( 0,0*1 ), where 
f ■ Jt^ a - ^ a )/ * Thus, J In humans seems to be some sort 
of * facial* system, with E out the nose, and i out the top 
of the forehead* Here we have made certain, though, that 
the eyes always focus on points only along the k-axis* f or 
1 straight-ahead" . 

We will be interested in finding from measured quantities 
the location of some point X s in S, and it will simplify 
matters if we let E s = x s - F e , and then find that vector 
instead. We note quickly that 
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where 



and proceed to the problem of finding g { non-dimensional 

displacement from ¥ in the * facial' system ) in terms of 
measureable quantities. 

The quantities we will measure oome from a comparison 
of the two Images recorded by the eyes. I assume that these 
images are formed by the projection of distant points onto 
planes perpendicular to rays between the eyee themselves and 



* It is possible to generalize and allow the eyes to focus 

on points other than straight-ahead, but the algebra becomes 
quite a bit more complicated* Since the *head f can be 
moved, this doesn*t seem n serious restriction. 



the focal point, F» Defining two new ays terns then* J^ and J r * 
{ see Fig. II ). with origins located in J Q at -T5 and +5 
respect ively* and oriented so that their k-axea point toward 
P and their 1-axes remain aliened with T » we can find the 
locations of any point I,-, ( in J ) in the new systems as 



where, ( the upper signs applying to the left eye, etc, ) 
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If we define planes normal to Jc at ( 0,0* <r } in fcoth Ji 
and J r , and then project a point ?Co onto then* the intersec- 
tion* will oocur at 
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In the 'right* system and plane. 

<t is merely a scale factor determined toy the optics alone. 
The key quantities, which define the location of the projec- 
tions in the image planes of Part It are, from (2-b) and (2*6) ■ 
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where again, and In what follows, the upper signs apply to 
the *left' system. 

It will be convenient to make several definitions, both 
to further non-dimensionalize the mathematics, and to save 
writing. We let 
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mnd not ft that equation! (2-51 now are 
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and that equations (2*7) > after some matrix algebra ainpli- 
fication* reduce to 
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Their usefulness now begins to begone apparent, for with (2-6), 

we can solve for £ In terms of c* A A . cf> ; 

I -t V<J> ~ P/Sp(4>+^ 
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It would be nice ( as I've Indicated by the weird forms 
of aquations (2-10) ) to simplify these with some approxima- 
tions, since In most oasea of Interest, 

We can't do this Juat yet, though, because of the tern Acf 1 , 
whose magnitude is unclear* We can get a handle on it, though, 
from equation (2-3) P where we noted that 
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t actually nay be more useful to us than £ in some 
cases i because It represents the actual ( d linens I onl ess ) 
position of the point 7 a In facial co-ordinates* We note that 

J. - e * 



and after souse mathematics » we find that 






ThlB quantity* however* can be reduced with (2-11) to 

1 



3»* 



6, 



3 >4>-p* 

and it is then clear that 

Several cases are possible and interesting. 

A. -£>» f ( focusing on Infinity ) - 

In this ease (fa-f?') - *a large negative number*, end 
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Equation (2-13d) is the only interesting one# It allows us 
to find the depth of any point by Just focusing on infinity 
and measuring A f 

B - J6J ^ t 3 ?q) ( mrxy point of nearly equal depth 

with the point of focus, ) 

In this case (k$-y3 J is a very snail number &• 1. and 

*"*"- 7^5 ' *+ ^'^ 

Here it is the first three which are of interest. They allow 
via simple calculations for the deduction of position relative 
to a known focal point t Cp . 

C* 6 3 > L (see what follows } 



In this case* ( b(p - p z ) > \ t and a look at equations (2-10) 

may cause some mathematicians to worry about small denominators. 

Let them rest easily though. Physically this is impossible 



since 



£ 3 >^ 3 implies &}>ijii-^ 3 implies <$> £. O | 

Negative focal lengths don't happen Tery often in practice. 



Summarizing the results of this section* then* we have 
shown that in facial co-ordinates* whenever 
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it is approximately true that 

The translation of these results from the facial system to 

the table system may be made through the use of 

or 
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where T w is given by equation (2-2). 



Ill, Search Limitation In the Pattern Matching Problem 

I really have no right to Jump Into this aspect of the 
problem too far, because I don't know enough of the hardware 
limitations and capabilities » but 1C my first assumption 
holds, then what follows should not be too far off the track, 
and since Its Important* I should say something about it. 
If llght*lntenslty measurements are indeed representable 
In a 2-D matrix for each eye* then lnformatlonally these 
matrices, HL and MR, will look as is shown In Fig* III, 

The matching by local evidence of elements within these 
matrices Involves ■ I will assume, something procedurally 
akin to i 

1. plucking a local region out of one matrix. 

2. Choosing an untested region of the other matrix. 

3. Overlaying the local region on top of it, 
**• Evaluating their local differences. 

5« Iterating steps 2-k until some cutoff occurs. 

6. Choosing the match with the smallest differences, 
I can*t really be less vague here without knowing more about 
the hardware and noise aspects of the problem, but its easy 
to imagine something like a "minlmlze-the-sum-of-the-squares- 
of-the-rdlfferences-over-several-elements" approach, which 
would require that for some local region of N x N (N odd) 
elements , 
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be minimized by adjusting the k and 1* In the minimum case, 
k and 1 would then represent half the quantized ( integer) 
shifts* 

* 

which existed in the "combined 1 Image tDatrix. K at 

Regardless of the form taken "by the 'evaluator* of step *** 
however t other more "basic questions remain i 

li What Is an acceptable cut-off criterioni 

2* In what order do we vary k and li 
3* How fcood is the answer we get? 

Without getting too involved* it*s easy to make some relevant 
observations using the results of Part IT* 

Equations (2-iQ)» for instance* make use of the quantity 

(oU+JO/i = <A 

but do not make xny reference to the analogous quant ly 

This quantity may be shown* however* to be redundant and* 
more importantly, quite small* From (2-9) and (2-11 ) f 

and it thus would be expected that 
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If we wish to let k'->2, this becomes 



*fii < d>s*(h'-^ = <t>s, 
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Equation (3-2) defines a region ( see Fig itf ) i n the ivage 
matrix* centered on the projection of the focal point at (0*0) 
and strictly bounded except along the axes* if we stay within 
this region* we will be assured that shifts in the oC direction 
will be below the quantization level* and hence in our search 
we may neglect all k except the trivial case of k ■ 0. 
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This limits our search within this region to on* dimen- 
sion only* along /S « and states that If ever we wish to 
find A outside the region defined by (3-2) 
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we must either e:o to a more complicated search or else 
shift the point of focus. Time tradeoffs based on speed 
would seem easy to develop. 

We can also note that the approximate value of A 
(approximate value of the shift) should be predictable If 
nearby shifts are already known f and if the region of 3- 
spaoe corresponding to the local regions being coaroared 
contains no step discontinuities in k { depth). 
A should vary in a continuous and plecewlse smooth manner 
along & i except where 3~D object boundaries exist ! (see 
PartK)# Remaining "on" an object then* we would expect 
that 1 would be almost or exactly the saae for adjacent 
points in the matrix , and we thus not only find a natural 
heuristic to help speed up search in these cases* but we 
have an immediate flag which signals object boundaries* 
whether or not a high level of noise would trigger this 
flag too often , is something I haven *t been able to real- 
istically figure out. 



A 

This will always be true if we seek £*a] in the follow- 
ing orders start next to the focal point r ( where tx s O ), 
and progress spirally outward. "Old" points will always be 
adjacent in the direction of the origin and "behind* 1 . 



As far as cut-off is concerned t It 18 sort of hard to 
think of a heuristic that works equally well when the pre- 
dictions are "good** and when they are not. 

We might test predictions by looking exhaust ively at 
some small number of points near the predicted one, and 
If the differences seem to fr£ increasing as we move 
away in either direction, the prediction is probably "good*, 
and we can use the best match found from this small set of 
data. If on the other hand, the "match 1- doesn H t seem 
particularly good anywhere along this llne t then it is 
likely that we hare crossed a discontinulty-of-depth in 
the scene viewed, and we have to do something strange. 
Perhaps it would be best to keep looking over greater and 
greater areas for a match, but perhaps not, for it Is quite 
possible that a match cannot be found I One must remember* 
after all, that near regions of depth-discontinuity ■ one 
eye sees thing* that are hidden to the other eye* 

Vhat I would then propose for a A, -finding algorithm 
would look, In a more refined form, something llkei 

1. Pick a new point of focus , 

2. Plan an outward*-s pi railing path of examination 
beginning at the origin and remfcnlng within the 
region given by (3-2;* 

3t Pick the next point on the path adjacent to a 
"good" point. 

k. Predict i at this point based on nearby values* 

5* Evaluate a small set of "overlays" shifted by about 2l'. 

6, If a definite best fit exists near the center of 
this line it is probably M good% Record the shift , 
go back to step 3 again* 

7, The shift is not good. He cord this fact f and go 
back to step 3 . anyway. 



As soon as no new points can be found at step 3t a region 
will have been mapped out* and we can either stop or go 
back to step Li depending on the Information we need. If 
we go back, we record what we have been able to detect be- 
fore moving on.* 

I. have one final comment to make on "noise". In line- 
drawing type programs* noise means extraneous variations in 
light intensity relative to f average* over planar surfaces, 
and so the easiest objects to work with are smooth and uniformly- 
colored* In the kind of pattern recognition program I've 
mentioned here, ■smooth and uniform' blocks are obviously 
terrible to work with, for locally the only variations in 
intensity are due to the Inverse-square losses in light from 
a point source* What we really want for a local pattern-matcher 
la objects with lots of local detail ( a light spray painting 
might be good ). The kind of noise we can't tolerate is 
variations between the eyes when they look at the same small 
region of space. Any 'noise' picked up eonslstantly by both 
eyes will only make the pattern matching ( and depth-perception ) 
more efficient* 

* It is interesting to compare this algoritm with the much 

more complete work"of Lerman (1). Although derived jndependantly* 

and applicable to different eye configurations, both attempt 

to deal with similar effects, and the reader is encouraged 

to regard them as complementary* Basicallyt lerman obtains 

a set of possible 'matches' by comparing intensity differences 

between the shifted image elements with a fixed cut-off* 

and then refines this ■possible set 1 to remove ambiguities 

and eliminate spurious points. His results are conceptually 

encouraging, but when applied to actual images take a great 4* 

of time. If a shifting point of focus and a goal-oriented 

measurement scheme were to be incorporated, it is possible 

that a more widely applicable set of programs could be 

generated. 



IV. General Hemarks. and Figures 

So far -this has been pretty mathematical, and It may 
be Interesting ( and Instructive ) to see what these results 
look like when applied to human vision. People 1 A eyes are 
about 2" apart* and so in what follows, d » 1% and 
distances may be interpreted either as d i mens ionl ess or in 
inches P since the numbers come out the same either way, 
( <A * ^> * A. , 4* *re always dlmensionless, however* ) 

If you hold a pencil up at arms length ( £ * 33" ) 
and focus on infinlty f equation (2-15) predicts that 

& J* 

and since this is one half the difference between your right 
aye's image and your left eye's, you should see the "two pencil 
tips* shifted apart by about /3a- f*t ■ -»06, (The minus sign 
claims that your right eye is responsible for the left image. 
You can check thla by blinking* ) This gives you ( or an 
uninitiated vision computer ) a handle on the size of the 
5 -scale, and I assume that the c* -scale is the sane, f So 
far the restriction to A^l doesn f t seem too Halting* ) 

Row try looking at something nearby and heavily textured , 
like a flower or a crumpled piece of paper, when T did this 
I found my eyes * jumped around" over the surface, making leaps 
of about |(<*»p )1 ■ *03 at ■/>« 10* Equation (3*2) then fixes 
my approximate* limit of resolution such that 



(.03)' = 'Of 1 *-^* 



ort 

This is about **0 seconds of arc { the thickness of a piece 
of newsprint at 20 feet )* which sounds like the right ballpark 
at least* Computer 'eyes* won't be able to keep up with this 
kind of accuracy t and so we will have to expect some major 
differences in performance from depth-sensitive programs 
linked to any realistic hardware* 

Pictures are also interesting* and I've included some 
in the following pages which I've taken the trouble to draw 
fairly accurately* One aside that strikes me as I look at 
them is that parallel lines don't come out looking very 
parallel* and yet I seem to remember the mention of some 
heuristics which made use of parallelism in line drawings* 
perhaps their authors made different assumptions than I have. 

These are line-drawing type pictures* even though a depth- 
sensitive program would be Just as happy with curves* and 
wouldn"t work at all without local detail in the planes 
themselves* I hope this doesn't bother anyone i curves and 
surfaces are hard to draw. 

Figure V shows the scene from the top, as a perspectiveless 



* 1 am assuming here that it is my 'depth-searcher' which is 
driving my point of focus around the object. Actually it 
might not be uniquely responsible, but on very irregular 
objects it seems likely it would be important* Also, I have 
assumed that my eyes 'Jump' only *h6n they reach the edge of 
the region defined by (3*2)* Tf something more conservative 
was taking place, the limit of resolution would come out 
smaller. 



blueprint Included only for clarity. The other figures are 
self-explanatory. Things to look for include the sign and 
magnitude of & over the image, the basic scale of the two axes 
of the image, and the ( very small ) effects of & in the 
prediction of ^. Whenever A-pV<J> is positive* the point 
indicated is farther away than is the focal point i When the 
reverse is true* it Is closer. All 'right eye" images are 
shoim dashed. 
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V* Error Analysis y Resolution Capability 



Equations <2~10) represent solutions to relative 3-D 
displacenent In terms of continuous and precise values 
of **j /^/ &>j ■ Approximations to these solutions 
have been given in equations (2-lifr), and are valid 
when the conditions (2-11) are satisfied. It remains 
to be seen, however* Just how accurate these approximations 
are when fed the quantized data, d J a* j£ <h* by a system 
with limits of resolution, S . This section will examine 
avch questions , and produce first order error and uncertainty 
estimates. 

The errors inherent to equations (2*1 41 may be written 
down directly as the difference between the two sets of 
equations t 
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For any given values of 4,/^ &,<!>, these may be regarded 
as functions of the variables 
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so that 



when <* fi fl £ A £ t£ t are small. 

We can solve exactly for the partials in (5*3) ■ and 
produce the results In tertns of measured quantities i 

The constant term in (5-3) arises from the u«# of (2-11) 
and the assumption that « 9 «|, . To first order terms, it 

may b« written as 



ComMninff (5-M «Jd (5-5)» and setting 
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we obtain i 






or approximate!/* with 
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a still simpler form, 
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With suitable restrictions on the use of the techniques 
of measurement* It would be expected that the first two 
terms In each of the above could be held arbitrarily 
snail* The fundamental limitation on accuracy In position 
measurement* however, is fired by the 'quantisation level, 
and Is represented by the third terms i approximately! 

While this limitation is small In its affect on horizontal 
and vertical position measurements* Its feffect on range 
resolution is not t and for £> = #001, the limits of 
range resolution near the focal point may be found as a 
function of *P to be 
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